-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete anchors/headings with data from Webref #2426
Conversation
This completes the update mechanism that produces the cross-references anchors and headings to also use data from Webref. To keep changes minimal and avoid introducing possibly conflicting anchors and headings here and there, data from Webref is only used for specs that are not in Shepherd's database. In practice, this adds definitions and headings from 267 specifications (~60 of which only have headings), generating ~3.5MB of anchors data and ~5.2MB of headings data.
Woah, this is super cool! I was in the middle of doing exactly this yesterday, but stopped for the day before I was finished, and now you've beat me to it. I tested specs generated with the current db against this db, and mostly it looks fine. There are a lot more "ambiguous for" errors now, which cause things to stop linking, but that's the sort of problem you run into in general; it's unavoidable. However, there are a handful of linking changes I think are legitimately wrong:
|
We have a patching process in place for IDL terms to create curated IDL extracts and, indeed, one of them drops Problem here is that we don't have a similar curation mechanism in place for definitions (this is being tracked in w3c/webref#789). Maintaining patches consumes time so we tend to resist the temptation to create more places where they could appear ;) I'll try to find a workaround in Reffy!
Argh, adding the glossary was requested by the i18n group precisely for cross-referencing purpose, see w3c/webref#465. Instead of picking up a term at random, would it be possible for Bikeshed to prefer normative definitions over informative ones, and exported ones over non-exported ones? In that particular case, that would mean the definitions in Infra would always be chosen over the ones in i18n-glossary. |
There now is a crude mechanism in place to patch definitions that need to be. Webref definitions data no longer contains duplicate |
Excellent! Re: the infra/i18n conflict, I already prefer exported dfns over unexported, but don't track whether a definition is normative or informative. (The concept of an informative definition is somewhat contradictory!) However, I do have a mechanism for preferring dfns from one spec over another when they're both possibilities, either at the individual dfn level or at the full spec level. I'll go ahead and deploy that to prefer Infra over i18n-glossary for the conflicting terms. |
This proposes to complete the update mechanism that produces the cross-references anchors and headings to also use data from Webref, as proposed in #1761. To keep changes minimal and avoid introducing possibly conflicting anchors and headings here and there (and thus avoid breaking existing specs), data from Webref is only used for specs that are not in Shepherd's database.
In practice, this adds definitions and headings from 267 specifications (~60 of which only have headings), generating ~3.5MB of anchors data and ~5.2MB of headings data. Full list below. Only 3 specs related to CSS that are not (yet) in Shepherd: CSS Anchor Positioning, CSS Images Module Level 5, and CSS Parser API.
Produced anchors and headings look good to me but I don't really know how to validate them in practice.
There won't be duplicates in the sense that anchors/headings data will either come from Shepherd or from Webref but not from both. Adding new anchors data means that there may be more situations where a term is defined in more than one specification though. Existing specifications should be able to continue linking to terms they already use but some may need to add a few additional linking defaults. If that seems useful, it should be relatively easy to build a list of terms defined in more than one spec to better understand what specifications are going to be potentially affected.
(Side note that I'm not fluent in Python so code may need some re-writing. I tried to be explicit about types for instance but not sure I got that part right, and not sure when these types are actually checked).
List of Webref specs added to cross-ref database
tc39-intl-negotiation
tc39-intl-numberformat
tc39-intl-pluralrules
test-methodology
)change-password-url
)webcodecs-alaw-codec-registration
)webcodecs-aac-codec-registration
)shape-detection-api
)text-detection-api
)accelerometer
)accname-1.2
)wai-aria-1.2
)ambient-light
)html-aria
)tc39-array-grouping
)tc39-atomics-wait-async
)attribution-reporting-api
)audio-output
)autoplay-detection
)webcodecs-av1-codec-registration
)webcodecs-avc-codec-registration
)background-fetch
)badging
)battery-status
)beacon
)capability-delegation
)capture-handle-identity
)tc39-change-array-by-copy
)clear-site-data
)client-hints-infrastructure
)fido-v2.1
)clipboard-apis
)close-watcher
)compression
)compute-pressure
)contact-api
)content-index
)csp-embedded-enforcement
)contentEditable
)cookie-store
)rfc6265bis
)core-aam-1.2
)crash-reporting
)css-anchor-1
)css-images-5
)css-parser-api
)custom-state-pseudo-class
)datacue
)tc39-decorators
)deprecation-reporting
)device-memory-1
)device-posture
)orientation-event
)digest-headers
)digital-goods
)document-policy
)DOM-Parsing
)is-input-pending
)tc39-explicit-resource-management
)ecma-402
)ecmascript
)edit-context
)element-timing
)encrypted-media
)epub-33
)epub-rs-33
)tc39-array-from-async
)event-timing
)tc39-intl-extend-timezonename
)eyedropper-api
)entries-api
)file-system-access
)webcodecs-flac-codec-registration
)gamepad-extensions
)gamepad
)geolocation
)geolocation-sensor
)get-installed-related-apps
)graphics-aam-1.0
)gyroscope
)webcodecs-hevc-codec-registration
)html-aam-1.0
)html-media-capture
)sanitizer-api
)video-rvfc
)webrtc-ice
)webrtc-identity
)idle-detection
)tc39-import-assertions
)IFT
)ink-enhancement
)input-device-capabilities
)input-events-2
)i18n-glossary
)intervention-reporting
)tc39-intl-enumeration
)tc39-intl-locale-info
)tc39-intl-duration-format
)tc39-iterator-helpers
)js-self-profiling
)tc39-json-modules
)json-ld11-framing
)json-ld11
)tc39-json-parse-with-source
)largest-contentful-paint
)layout-instability
)webcodecs-pcm-codec-registration
)webpackage
)local-font-access
)longtasks-1
)magnetometer
)manifest-incubations
)mathml-aam
)mathml-core
)performance-measure-memory
)media-capabilities
)mediacapture-automation
)mediacapture-fromelement
)media-feeds
)media-playback-quality
)mediasession
)media-source-2
)image-capture
)mediastream-recording
)mst-content-hint
)mediacapture-transform
)miniapp-lifecycle
)miniapp-manifest
)miniapp-packaging
)webcodecs-mp3-codec-registration
)window-placement
)navigation-api
)netinfo
)openscreenprotocol
)webcodecs-opus-codec-registration
)orientation-sensor
)overscroll-scrollend-events
)page-lifecycle
)paint-timing
)payment-handler
)payment-method-manifest
)payment-request-1.1
)performance-timeline
)picture-in-picture
)pointerlock-2
)PNG-spec
)portals
)prefer-current-tab
)prefetch
)prerendering-revamped
)scheduling-apis
)priority-hints
)private-click-measurement
)private-network-access
)tc39-array-find-from-last
)proximity
)push-api
)rdf11-concepts
)n-quads
)rdf11-mt
)rdf-canon
)mediacapture-region
)tc39-regexp-modifiers
)requestidlecallback
)tc39-resizablearraybuffer
)resource-hints
)resource-timing
)responsive-image-client-hints
)rfc6265
)rfc6266
)rfc7230
)rfc7231
)rfc7232
)rfc7233
)rfc7234
)rfc7235
)rfc7538
)rfc7540
)rfc7616
)rfc7617
)rfc7725
)rfc7838
)rfc8246
)rfc8288
)rfc8470
)rfc8942
)rfc9110
)rfc9111
)rfc9112
)rfc9113
)rfc9114
)rfc9163
)savedata
)webrtc-svc
)screen-capture
)screen-wake-lock
)csp-next
)webcrypto-secure-curves
)secure-payment-confirmation
)selection-api
)server-timing
)tc39-set-methods
)tc39-shadowrealm
)speculation-rules
)svg-aam-1.0
)svg-animations
)svg-integration
)svg-strokes
)tc39-symbols-as-weakmap-keys
)tc39-temporal
)scroll-to-text-fragment
)model-element
)mediacapture-handle-actions
)storage-access
)timing-entrytypes-registry
)touch-events
)tracking-dnt
)trusted-types
)webcodecs-ulaw-codec-registration
)uievents-code
)uievents-key
)urlpattern
)user-preference-media-features-headers
)user-timing
)ua-client-hints
)vibration
)mediacapture-viewport
)virtual-keyboard
)webcodecs-vorbis-codec-registration
)webcodecs-vp8-codec-registration
)webcodecs-vp9-codec-registration
)w3c-patent-policy
)w3c-process
)graphics-aria-1.0
)web-app-launch
)manifest-app-info
)web-bluetooth
)web-locks
)webmidi
)webnn
)web-nfc
)periodic-background-sync
)web-share-target
)speech-api
)wasm-core-1
)wasm-js-api-2-fork-exception-handling
)wasm-web-api-2
)webcodecs-codec-registry
)webcodecs
)webdriver-bidi
)webgl2
)webgl1
)WGSL
)webgpu
)webhid
)web-otp
)webrtc-encoded-transform
)webrtc-priority
)webtransport
)webvtt1
)anchors
)webxr-depth-sensing-1
)webxr-lighting-estimation-1
)raw-camera-access
)tc39-is-usv-string
)window-controls-overlay
)WOFF
)